NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Stable Minima Cannot Overfit in Univariate ReLU Networks: Generalization by Large Step Sizes

Qiao, Dan; Zhang, Kaiqi; Singh, Esha; Soudry, Daniel; Wang, Yu-Xiang (October 2024, Advances in neural information processing systems)

We study the generalization of two-layer ReLU neural networks in a univariate nonparametric regression problem with noisy labels. This is a problem where kernels (\emph{e.g.} NTK) are provably sub-optimal and benign overfitting does not happen, thus disqualifying existing theory for interpolating (0-loss, global optimal) solutions. We present a new theory of generalization for local minima that gradient descent with a constant learning rate can \emph{stably} converge to. We show that gradient descent with a fixed learning rate η can only find local minima that represent smooth functions with a certain weighted \emph{first order total variation} bounded by 1/η−1/2+O˜(σ+MSE‾‾‾‾‾√) where σ is the label noise level, MSE is short for mean squared error against the ground truth, and O˜(⋅) hides a logarithmic factor. Under mild assumptions, we also prove a nearly-optimal MSE bound of O˜(n−4/5) within the strict interior of the support of the n data points. Our theoretical results are validated by extensive simulation that demonstrates large learning rate training induces sparse linear spline fits. To the best of our knowledge, we are the first to obtain generalization bound via minima stability in the non-interpolation case and the first to show ReLU NNs without regularization can achieve near-optimal rates in nonparametric regression.
more » « less
Full Text Available
Nonparametric Classification on Low Dimensional Manifolds using Overparameterized Convolutional Residual Networks

Zhang, Zixuan; Zhang, Kaiqi; Chen, Minshuo; Takeda, Yuma; Wang, Mengdi; Zhao, Tuo; Wang, Yu-Xiang (October 2024, Advances in neural information processing systems)

Convolutional residual neural networks (ConvResNets), though overparameterized, can achieve remarkable prediction performance in practice, which cannot be well explained by conventional wisdom. To bridge this gap, we study the performance of ConvResNeXts, which cover ConvResNets as a special case, trained with weight decay from the perspective of nonparametric classification. Our analysis allows for infinitely many building blocks in ConvResNeXts, and shows that weight decay implicitly enforces sparsity on these blocks. Specifically, we consider a smooth target function supported on a low-dimensional manifold, then prove that ConvResNeXts can adapt to the function smoothness and low-dimensional structures and efficiently learn the function without suffering from the curse of dimensionality. Our findings partially justify the advantage of overparameterized ConvResNeXts over conventional machine learning models.
more » « less
Full Text Available
Deep Learning meets Nonparametric Regression: Are Weight-Decayed DNNs Locally Adaptive?

Zhang, Kaiqi; Wang, Yu-Xiang (February 2023, International Conference on Learning Representation (ICLR-23))

We study the theory of neural network (NN) from the lens of classical nonparametric regression problems with a focus on NN's ability to adaptively estimate functions with heterogeneous smoothness -- a property of functions in Besov or Bounded Variation (BV) classes. Existing work on this problem requires tuning the NN architecture based on the function spaces and sample sizes. We consider a "Parallel NN" variant of deep ReLU networks and show that the standard weight decay is equivalent to promoting the ℓp-sparsity (0<1) of the coefficient vector of an end-to-end learned function bases, i.e., a dictionary. Using this equivalence, we further establish that by tuning only the weight decay, such Parallel NN achieves an estimation error arbitrarily close to the minimax rates for both the Besov and BV classes. Notably, it gets exponentially closer to minimax optimal as the NN gets deeper. Our research sheds new lights on why depth matters and how NNs are more powerful than kernel methods.
more » « less
Full Text Available
General-Purpose Bayesian Tensor Learning With Automatic Rank Determination and Uncertainty Quantification

https://doi.org/10.3389/frai.2021.668353

Zhang, Kaiqi; Hawkins, Cole; Zhang, Zheng (January 2022, Frontiers in Artificial Intelligence)

A major challenge in many machine learning tasks is that the model expressive power depends on model size. Low-rank tensor methods are an efficient tool for handling the curse of dimensionality in many large-scale machine learning models. The major challenges in training a tensor learning model include how to process the high-volume data, how to determine the tensor rank automatically, and how to estimate the uncertainty of the results. While existing tensor learning focuses on a specific task, this paper proposes a generic Bayesian framework that can be employed to solve a broad class of tensor learning problems such as tensor completion, tensor regression, and tensorized neural networks. We develop a low-rank tensor prior for automatic rank determination in nonlinear problems. Our method is implemented with both stochastic gradient Hamiltonian Monte Carlo (SGHMC) and Stein Variational Gradient Descent (SVGD). We compare the automatic rank determination and uncertainty quantification of these two solvers. We demonstrate that our proposed method can determine the tensor rank automatically and can quantify the uncertainty of the obtained results. We validate our framework on tensor completion tasks and tensorized neural network training tasks.
more » « less
Full Text Available
Sparse Tucker Tensor Decomposition on a Hybrid FPGA–CPU Platform

https://doi.org/10.1109/TCAD.2020.3032626

Jiang, Weiyun; Zhang, Kaiqi; Lin, Colin Yu; Xing, Feng; Zhang, Zheng (September 2021, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems)

Full Text Available
3U-EdgeAI: Ultra-Low Memory Training, Ultra-Low Bitwidth Quantization, and Ultra-Low Latency Acceleration

https://doi.org/10.1145/3453688.3461738

Chen, Yao; Hawkins, Cole; Zhang, Kaiqi; Zhang, Zheng; Hao, Cong (June 2021, Great Lakes Symposium on VLSI)

Full Text Available
On-FPGA training with ultra memory reduction: A low-precision tensor method

Zhang, Kaiqi; Hawkins, Cole; Zhang, Xiyuan; Hao, Cong; Zhang, Zheng (May 2021, ICLR Workshop on Hardware Aware Efficient Training)

Various hardware accelerators have been developed for energy-efficient and real-time inference of neural networks on edge devices. However, most training is done on high-performance GPUs or servers, and the huge memory and computing costs prevent training neural networks on edge devices. This paper proposes a novel tensor-based training framework, which offers orders-of-magnitude memory reduction in the training process. We propose a novel rank-adaptive tensorized neural network model, and design a hardware-friendly low-precision algorithm to train this model. We present an FPGA accelerator to demonstrate the benefits of this training method on edge devices. Our preliminary FPGA implementation achieves 59× speedup and 123× energy reduction compared to embedded CPU, and 292× memory reduction over a standard full-size training.
more » « less
Full Text Available
A Systematic DNN Weight Pruning Framework Using Alternating Direction Method of Multipliers

Zhang, Tianyun; Ye, Shaokai; Zhang, Kaiqi; Tang, Jian; Wen, Wujie; Fardad, Makan; Wang, Yanzhi (September 2018, 2018 the 15th European Conference on Computer Vision (ECCV))

Full Text Available
Hierarchical Learning for Cognitive End-to-End Service Provisioning in Multi-Domain Autonomous Optical Networks

https://doi.org/10.1109/JLT.2018.2883898

Liu, Gengchen; Zhang, Kaiqi; Chen, Xiaoliang; Lu, Hongbo; Guo, Jiannan; Yin, Jie; Proietti, Roberto; Zhu, Zuqing; Yoo, S. J. (January 2019, Journal of Lightwave Technology)

Full Text Available

Search for: All records